In [63]:

    
import pandas as pd

Identify your problem statement, find all your datasets, identify the questions you want to answer, reach out to polling/consulting firms to work with.

Potential question--Why did these counties flip to Trump?

Explore your data to understand it--drop data that is not relevant

Look to predict something (next presidential election outcome).

Think about what would happen if more people became UNINSURED and the result that could have.

Should slcie by margin of county flip. First-fourth quartiles

Look at population counts per county.

Margin of victory/voting which way (Trump/Clinton) is more important to predict than simply whcih flipped (make that a subset)

A listing of the specific counties that flipped: http://www.npr.org/2016/11/15/502032052/lots-of-people-voted-for-obama-and-trump-heres-where-in-3-charts

Nate Silver postulates that education level is a key predctor. http://fivethirtyeight.com/features/education-not-income-predicted-who-would-vote-for-trump/?ex_cid=story-twitter

Daily Kos article: http://www.dailykos.com/story/2017/1/30/1627319/-Daily-Kos-Elections-presents-the-2016-presidential-election-results-by-congressional-district

Diversity Index scource: https://www.kaggle.com/mikejohnsonjr/us-counties-diversity-index



In [64]:

    
election = pd.read_csv('2016_election.csv')



In [65]:

    
prev_election = pd.read_csv('2012_election.csv')



In [66]:

    
ui_change = pd.read_csv('County_Data_2016.csv')



In [67]:

    
div = pd.read_csv('diversityindex.csv')



In [68]:

    
edu = pd.read_excel('education_25_older_filt.xls')

Change in education the past 10 years--find the difference between them for each county



In [69]:

    
pop = pd.read_excel('us county populations.xls')



In [70]:

    
len(edu)









    Out[70]:





3283



In [71]:

    
len(pop)









    Out[71]:





3145



In [72]:

    
pop.dtypes









    Out[72]:





state              object
county             object
est_pop_2015        int64
pop_change_2015     int64
int_mig_2015        int64
dom_mig_2015        int64
mig_2015            int64
dtype: object



In [73]:

    
div.head()









    Out[73]:






  
    
      
      Location
      Diversity-Index
      Black or African American alone, percent, 2013
      American Indian and Alaska Native alone, percent, 2013
      Asian alone, percent, 2013
      Native Hawaiian and Other Pacific Islander alone, percent,
      Two or More Races, percent, 2013
      Hispanic or Latino, percent, 2013
      White alone, not Hispanic or Latino, percent, 2013
    
  
  
    
      0
      Aleutians West Census Area, AK
      0.769346
      7.4
      13.8
      31.1
      2.3
      4.8
      14.6
      29.2
    
    
      1
      Queens County, NY
      0.742224
      20.9
      1.3
      25.2
      0.2
      2.7
      28.0
      26.7
    
    
      2
      Maui County, HI
      0.740757
      0.8
      0.6
      28.8
      10.6
      23.3
      10.7
      31.5
    
    
      3
      Alameda County, CA
      0.740399
      12.4
      1.2
      28.2
      1.0
      5.2
      22.7
      33.2
    
    
      4
      Aleutians East Borough, AK
      0.738867
      7.7
      21.8
      41.4
      0.7
      3.7
      13.5
      12.9



In [74]:

    
div = div.rename(columns={'Location':'county_state','Diversity-Index':'div_index','Black or African American alone, percent, 2013':'af_am','American Indian and Alaska Native alone, percent, 2013':'native_2013','Asian alone, percent, 2013':'asian_am','Native Hawaiian and Other Pacific Islander alone, percent,':'pac_am','Two or More Races, percent, 2013':'two_or_more_races','Hispanic or Latino, percent, 2013':'hisp_lat_am','White alone, not Hispanic or Latino, percent, 2013':'white_am'})



In [75]:

    
len(div)









    Out[75]:





3195



In [76]:

    
election.county_name.count()









    Out[76]:





3141



In [77]:

    
#Need to drop Alaska as it doesn't have any county names
election = election[election.county_name!='Alaska']
pop = pop[pop.county!='Alaska']



In [78]:

    
election = election.drop(election[[0,10]], axis=1)



In [79]:

    
election['county_state'] = election['county_name'] + ', ' + election['state_abbr']



In [80]:

    
prev_election['county_state'] = prev_election['county_name'] + ', ' + prev_election['state_abbr']



In [81]:

    
ui_change['county_state'] = ui_change['county_name'] + ', ' + ui_change['state_abbrev']



In [82]:

    
pop.head()









    Out[82]:






  
    
      
      state
      county
      est_pop_2015
      pop_change_2015
      int_mig_2015
      dom_mig_2015
      mig_2015
    
  
  
    
      0
      AL
      Alabama
      4858979
      12568
      5726
      -2268
      3458
    
    
      1
      AL
      Autauga County
      55347
      57
      19
      -140
      -121
    
    
      2
      AL
      Baldwin County
      203709
      3996
      221
      3469
      3690
    
    
      3
      AL
      Barbour County
      26489
      -326
      0
      -281
      -281
    
    
      4
      AL
      Bibb County
      22583
      34
      21
      4
      25



In [83]:

    
pop['county_state'] = pop['county'] + ', ' + pop['state']



In [84]:

    
edu['county_state'] = edu['Area name'] + ', ' + edu['State']



In [85]:

    
edu.isnull().sum()









    Out[85]:





FIPS Code                                0
State                                    0
Area name                                0
less_hs_diploma_2000                    11
hs_diploma_only_2000                    11
less_4_years_2000                       11
four_or_ higher_2000                    11
per_less_high_school diploma_2000       11
per_hs_diploma_only_2000                11
per_less_4_years_2000                   11
per_four_or_ higher_2000                11
less_high_school_diploma_2011_15        10
hs_diploma_only_2011_15                 10
less_4_years_2011_15                    10
four_or_ higher_2011_15                 10
per_less_high_school_diploma_2011_15    10
per_hs_diploma_only_2011_15             10
per_less_4_years_2011_15                10
per_four_or_higher_2011_15              10
county_state                             0
dtype: int64



In [86]:

    
edu = edu.dropna()



In [87]:

    
import seaborn as sns
import matplotlib.pyplot as plt
ax = sns.distplot(edu.per_less_high_school_diploma_2011_15, kde=False)
ax.set(xlabel='Percentage per county with less than a High School Diploma, 2011-2015', ylabel='Count')
ax.set_title('Distribution of Education Levels Across All US Counties', fontsize=16, fontname='Ubuntu')
plt.show()



In [88]:

    
ax = sns.distplot(edu.per_hs_diploma_only_2011_15, kde=False)
ax.set(xlabel='Percentage per county with only High School Diploma, 2011-2015', ylabel='Count')
ax.set_title('Distribution of Education Levels Across All US Counties', fontsize=16, fontname='Ubuntu')
plt.show()



In [89]:

    
ax = sns.distplot(edu.per_less_4_years_2011_15, kde=False)
ax.set(xlabel='Percentage per county with less than four years of college, 2011-2015', ylabel='Count')
ax.set_title('Distribution of Education Levels Across All US Counties', fontsize=16, fontname='Ubuntu')
plt.show()



In [90]:

    
ax = sns.distplot(edu.per_four_or_higher_2011_15, kde=False)
ax.set(xlabel='Percentage per county with four or more years of college, 2011-2015', ylabel='Count')
ax.set_title('Distribution of Education Levels Across All US Counties', fontsize=16, fontname='Ubuntu')
plt.show()



In [91]:

    
election['per_dem'] = election['per_dem'].apply(lambda x: x*100)
election['per_gop'] = election['per_gop'].apply(lambda x: x*100)



In [92]:

    
prev_election['per_dem_2012'] = prev_election['per_dem_2012'].apply(lambda x: x*100)
prev_election['per_gop_2012'] = prev_election['per_gop_2012'].apply(lambda x: x*100)



In [93]:

    
election['per_point_diff'] = election['per_point_diff'].apply(lambda x: float(x.strip('%')))



In [94]:

    
# Making a new column for positive and negative--if per_dem is below 50%, negative. If
# above 50%, positive.



In [95]:

    
election['election_range'] = election['per_dem'] - election['per_gop']



In [96]:

    
prev_election['election_range'] = prev_election['per_dem_2012'] - prev_election['per_gop_2012']



In [97]:

    
ax = sns.distplot(election.election_range, kde=False)
ax.set(xlabel = "(Percentage won in each county, either Republican (-) or Democrat (+))", ylabel='Count')
ax.set_title('Percent Won By Each Party Across All US Counties, 2016', fontsize=16, fontname='Ubuntu')
plt.show()

Democrats are in big trouble. Of course, this distribution doesn't mean that they're necessarily losing counties, but of those they held onto in 2016, they have a far, far weaker grasp on them than Republicans do on their side. Also, many of the Republican counties are in Red States with few electoral votes. However, for Congressional voting this is still a dangerous sign.



In [98]:

    
# What was it like in 2012? 
ax = sns.distplot(prev_election.election_range, kde=False)
ax.set(xlabel = "(negative=Republican, positive=Democrat, %)", ylabel='Count')
ax.set_title('Percent Won By Each Party Across All US Counties, 2012', fontsize=15, fontname='Ubuntu')
plt.show()
# It was already bad. But it's clearly gotten worse for Democrats.



In [99]:

    
election['slight_dem'] = election['election_range'].apply(lambda x: 0< x <= 10)
election['slight_gop'] = election['election_range'].apply(lambda x: -10 <= x < 0)
election['med_dem'] = election['election_range'].apply(lambda x: 10< x <= 25)
election['med_gop'] = election['election_range'].apply(lambda x: -25 <= x < -10)
election['strong_dem'] = election['election_range'].apply(lambda x: 25 < x <= 50)
election['strong_gop'] = election['election_range'].apply(lambda x: -50 <= x < -25)



In [100]:

    
election.head()









    Out[100]:






  
    
      
      votes_dem
      votes_gop
      total_votes
      per_dem
      per_gop
      diff
      per_point_diff
      state_abbr
      county_name
      county_state
      election_range
      slight_dem
      slight_gop
      med_dem
      med_gop
      strong_dem
      strong_gop
    
  
  
    
      29
      5908.0
      18110.0
      24661.0
      23.956855
      73.435789
      12,202
      49.48
      AL
      Autauga County
      Autauga County, AL
      -49.478934
      False
      False
      False
      False
      False
      True
    
    
      30
      18409.0
      72780.0
      94090.0
      19.565310
      77.351472
      54,371
      57.79
      AL
      Baldwin County
      Baldwin County, AL
      -57.786162
      False
      False
      False
      False
      False
      False
    
    
      31
      4848.0
      5431.0
      10390.0
      46.660250
      52.271415
      583
      5.61
      AL
      Barbour County
      Barbour County, AL
      -5.611165
      False
      True
      False
      False
      False
      False
    
    
      32
      1874.0
      6733.0
      8748.0
      21.422039
      76.966164
      4,859
      55.54
      AL
      Bibb County
      Bibb County, AL
      -55.544124
      False
      False
      False
      False
      False
      False
    
    
      33
      2150.0
      22808.0
      25384.0
      8.469902
      89.851875
      20,658
      81.38
      AL
      Blount County
      Blount County, AL
      -81.381973
      False
      False
      False
      False
      False
      False



In [101]:

    
ue_rates = pd.read_excel('Unemployment Rates.xlsx')
ue_rates = ue_rates.drop(ue_rates[[0,1,2,4,5]],axis=1)
ue_rates = ue_rates.rename(columns={'Unnamed: 3':'county_state','Unnamed: 6':'labor_force', 'Unnamed: 7':'employed','Unnamed: 8':'unemployed','Unnamed: 9':'ue_rate'})
ue_rates = ue_rates.drop(ue_rates.index[[0,1,2,3,4]])



In [102]:

    
ue_rates.labor_force = ue_rates.labor_force.astype(float)
ue_rates.employed =  ue_rates.employed.astype(float)
ue_rates.unemployed =  ue_rates.unemployed.astype(float)
ue_rates.ue_rate =  ue_rates.ue_rate.astype(float)



In [103]:

    
ue_rates.dtypes









    Out[103]:





county_state     object
labor_force     float64
employed        float64
unemployed      float64
ue_rate         float64
dtype: object



In [104]:

    
right = election.set_index('county_state')
left = ue_rates.set_index('county_state')
combined_1 = left.join(right, lsuffix='', rsuffix='_r')
combined_1 = combined_1.reset_index()



In [105]:

    
right = combined_1.set_index('county_state')
left = ui_change.set_index('county_state')
combined_2 = left.join(right, lsuffix='', rsuffix = '_r')
combined_2 = combined_2.reset_index()



In [106]:

    
right = combined_2.set_index('county_state')
left = div.set_index('county_state')
combined_3 = left.join(right, lsuffix='', rsuffix = '_r')
combined_3 = combined_3.reset_index()



In [107]:

    
right = combined_3.set_index('county_state')
left = edu.set_index('county_state')
combined_4 = left.join(right, lsuffix='', rsuffix = '_r')
combined_4 = combined_4.reset_index()



In [108]:

    
right = combined_4.set_index('county_state')
left = pop.set_index('county_state')
combined_5 = left.join(right, lsuffix='', rsuffix = '_r')
combined_5 = combined_5.reset_index()



In [109]:

    
combined_5.isnull().sum()









    Out[109]:





county_state                             0
state                                    0
county                                   0
est_pop_2015                             0
pop_change_2015                          0
int_mig_2015                             0
dom_mig_2015                             0
mig_2015                                 0
FIPS Code                                9
State                                    9
Area name                                9
less_hs_diploma_2000                     9
hs_diploma_only_2000                     9
less_4_years_2000                        9
four_or_ higher_2000                     9
per_less_high_school diploma_2000        9
per_hs_diploma_only_2000                 9
per_less_4_years_2000                    9
per_four_or_ higher_2000                 9
less_high_school_diploma_2011_15         9
hs_diploma_only_2011_15                  9
less_4_years_2011_15                     9
four_or_ higher_2011_15                  9
per_less_high_school_diploma_2011_15     9
per_hs_diploma_only_2011_15              9
per_less_4_years_2011_15                 9
per_four_or_higher_2011_15               9
div_index                               13
af_am                                   13
native_2013                             13
                                        ..
pac_am                                  13
two_or_more_races                       13
hisp_lat_am                             13
white_am                                13
county_fips                             13
county_name                             13
state_abbrev                            13
2013 uninsured rate                     13
2016 uninsured rate                     13
decrease from 2013 to 2016              13
labor_force                             27
employed                                27
unemployed                              27
ue_rate                                 27
votes_dem                               46
votes_gop                               46
total_votes                             46
per_dem                                 46
per_gop                                 46
diff                                    46
per_point_diff                          46
state_abbr                              46
county_name_r                           46
election_range                          46
slight_dem                              46
slight_gop                              46
med_dem                                 46
med_gop                                 46
strong_dem                              46
strong_gop                              46
dtype: int64



In [110]:

    
combined_5.dropna(inplace=True)



In [111]:

    
combined_5 = combined_5[combined_5.county_name_r!='Alaska']
#Just making sure Alaska isn't included



In [112]:

    
combined_5.head()









    Out[112]:






  
    
      
      county_state
      state
      county
      est_pop_2015
      pop_change_2015
      int_mig_2015
      dom_mig_2015
      mig_2015
      FIPS Code
      State
      ...
      per_point_diff
      state_abbr
      county_name_r
      election_range
      slight_dem
      slight_gop
      med_dem
      med_gop
      strong_dem
      strong_gop
    
  
  
    
      0
      Abbeville County, SC
      SC
      Abbeville County
      24932
      6
      22
      -12
      10
      45001.0
      SC
      ...
      28.25
      SC
      Abbeville County
      -28.254383
      False
      False
      False
      False
      False
      True
    
    
      1
      Acadia Parish, LA
      LA
      Acadia Parish
      62577
      79
      32
      -281
      -249
      22001.0
      LA
      ...
      56.67
      LA
      Acadia Parish
      -56.674943
      False
      False
      False
      False
      False
      False
    
    
      2
      Accomack County, VA
      VA
      Accomack County
      32973
      -25
      81
      -53
      28
      51001.0
      VA
      ...
      11.71
      VA
      Accomack County
      -11.710568
      False
      False
      False
      True
      False
      False
    
    
      3
      Ada County, ID
      ID
      Ada County
      434211
      7364
      933
      3838
      4771
      16001.0
      ID
      ...
      9.24
      ID
      Ada County
      -9.239878
      False
      True
      False
      False
      False
      False
    
    
      4
      Adair County, IA
      IA
      Adair County
      7228
      -189
      0
      -161
      -161
      19001.0
      IA
      ...
      35.36
      IA
      Adair County
      -35.355148
      False
      False
      False
      False
      False
      True
    
  

5 rows × 61 columns



In [113]:

    
election.describe()









    Out[113]:






  
    
      
      votes_dem
      votes_gop
      total_votes
      per_dem
      per_gop
      per_point_diff
      election_range
    
  
  
    
      count
      3.112000e+03
      3112.000000
      3.112000e+03
      3112.000000
      3112.000000
      3112.000000
      3112.000000
    
    
      mean
      2.006065e+04
      19622.378856
      4.174537e+04
      31.708228
      63.613409
      39.233014
      -31.905181
    
    
      std
      7.199807e+04
      40442.737492
      1.134048e+05
      15.358601
      15.651728
      20.793041
      30.883786
    
    
      min
      4.000000e+00
      57.000000
      6.400000e+01
      3.144654
      4.122067
      0.040000
      -91.636364
    
    
      25%
      1.166000e+03
      3206.000000
      4.820500e+03
      20.475924
      54.947846
      22.467500
      -54.689887
    
    
      50%
      3.153000e+03
      7164.500000
      1.094700e+04
      28.473862
      66.743096
      40.315000
      -38.217390
    
    
      75%
      9.608500e+03
      17448.250000
      2.879650e+04
      39.999326
      75.147062
      55.462500
      -14.876874
    
    
      max
      1.893770e+06
      620285.000000
      2.652072e+06
      92.846592
      95.272727
      91.640000
      88.724525



In [114]:

    
# Set up range variables
ax = sns.distplot(combined_5.election_range, kde=False)
ax.set(xlabel = "(negative=Republican, positive=Democrat, %)", ylabel='Count')
ax.set_title('Partisan Pattern per All US Counties, 2016', fontsize=16, fontname='Ubuntu')
plt.show()



In [115]:

    
len(combined_5)









    Out[115]:





3104



In [117]:

    
# All counties, not including those in Alaska.



In [118]:

    
virginia = combined_5[combined_5.state_abbr=='VA']
virginia.head()









    Out[118]:






  
    
      
      county_state
      state
      county
      est_pop_2015
      pop_change_2015
      int_mig_2015
      dom_mig_2015
      mig_2015
      FIPS Code
      State
      ...
      per_point_diff
      state_abbr
      county_name_r
      election_range
      slight_dem
      slight_gop
      med_dem
      med_gop
      strong_dem
      strong_gop
    
  
  
    
      2
      Accomack County, VA
      VA
      Accomack County
      32973
      -25
      81
      -53
      28
      51001.0
      VA
      ...
      11.71
      VA
      Accomack County
      -11.710568
      False
      False
      False
      True
      False
      False
    
    
      30
      Albemarle County, VA
      VA
      Albemarle County
      105703
      1352
      410
      675
      1085
      51003.0
      VA
      ...
      25.06
      VA
      Albemarle County
      25.056116
      False
      False
      False
      False
      True
      False
    
    
      37
      Alexandria city, VA
      VA
      Alexandria city
      153511
      2071
      2334
      -2139
      195
      51510.0
      VA
      ...
      59.03
      VA
      Alexandria city
      59.026135
      False
      False
      False
      False
      False
      False
    
    
      45
      Alleghany County, VA
      VA
      Alleghany County
      15677
      -207
      2
      -85
      -83
      51005.0
      VA
      ...
      37.07
      VA
      Alleghany County
      -37.065426
      False
      False
      False
      False
      False
      True
    
    
      56
      Amelia County, VA
      VA
      Amelia County
      12903
      118
      8
      123
      131
      51007.0
      VA
      ...
      36.30
      VA
      Amelia County
      -36.304193
      False
      False
      False
      False
      False
      True
    
  

5 rows × 61 columns



In [119]:

    
# Making swing state list based on the crucial swing states this election.

IA = combined_5[combined_5['state_abbr']==('IA')]
WI = combined_5[combined_5['state_abbr']==('WI')]
MI = combined_5[combined_5['state_abbr']==('MI')]
PA = combined_5[combined_5['state_abbr']==('PA')]
FL = combined_5[combined_5['state_abbr']==('FL')]
NC = combined_5[combined_5['state_abbr']==('NC')]
OH = combined_5[combined_5['state_abbr']==('OH')]
MN = combined_5[combined_5['state_abbr']==('MN')]
swing_states= pd.concat([IA, WI, MI, PA, FL, NC, OH, MN])
# 'IA', 'WI','MI','PA','FL','NC','OH','MN'



In [120]:

    
swing_states.head()









    Out[120]:






  
    
      
      county_state
      state
      county
      est_pop_2015
      pop_change_2015
      int_mig_2015
      dom_mig_2015
      mig_2015
      FIPS Code
      State
      ...
      per_point_diff
      state_abbr
      county_name_r
      election_range
      slight_dem
      slight_gop
      med_dem
      med_gop
      strong_dem
      strong_gop
    
  
  
    
      4
      Adair County, IA
      IA
      Adair County
      7228
      -189
      0
      -161
      -161
      19001.0
      IA
      ...
      35.36
      IA
      Adair County
      -35.355148
      False
      False
      False
      False
      False
      True
    
    
      9
      Adams County, IA
      IA
      Adams County
      3796
      -75
      0
      -80
      -80
      19003.0
      IA
      ...
      39.77
      IA
      Adams County
      -39.769452
      False
      False
      False
      False
      False
      True
    
    
      40
      Allamakee County, IA
      IA
      Allamakee County
      13886
      -175
      21
      -216
      -195
      19005.0
      IA
      ...
      24.32
      IA
      Allamakee County
      -24.323534
      False
      False
      False
      True
      False
      False
    
    
      75
      Appanoose County, IA
      IA
      Appanoose County
      12529
      -99
      -2
      -61
      -63
      19007.0
      IA
      ...
      36.38
      IA
      Appanoose County
      -36.384514
      False
      False
      False
      False
      False
      True
    
    
      106
      Audubon County, IA
      IA
      Audubon County
      5773
      -20
      0
      -19
      -19
      19009.0
      IA
      ...
      31.25
      IA
      Audubon County
      -31.251850
      False
      False
      False
      False
      False
      True
    
  

5 rows × 61 columns



In [121]:

    
ax = sns.distplot(swing_states.election_range, kde=False)
ax.set(xlabel = "Negative=Republican, Positive=Democrat (%)", ylabel='Count')
ax.set_title('Partisan Degree in All Swing State Counties, 2016', fontsize=16, fontname='Ubuntu')
plt.show()
# As expected, in swing states it's not AS bad for Democrats compared to the rest of the 
# country but still quite dire.



In [122]:

    
VA = combined_4[combined_4['state_abbr']==('VA')]
ax = sns.distplot(VA.election_range, kde=False)
ax.set(xlabel = "Party Degrees Per VA County (%), (negative=Republican, positive=Democrat)", ylabel='Count')
ax.set_title('Partisan Degree in All Virginia Counties', fontsize=15, fontname='Ubuntu')
plt.show()

Influence of Ethnicity



In [123]:

    
import matplotlib.pyplot as plt
import seaborn as sns



In [124]:

    
ax = sns.regplot(combined_5.div_index, combined_5.per_dem)
ax.set(xlabel = 'Diversity Index', ylabel = 'County Vote Percent Democrat(%)')
ax.set_title("Diversity's Contribution to Democratic Votes in All US Counties", fontsize=16)
plt.show()



In [125]:

    
ax = sns.regplot(combined_5.div_index, combined_5.per_gop)
ax.set(xlabel = 'Diversity Index', ylabel = 'County Vote Percent Republican(%)')
ax.set_title("Diversity's Contribution to Republican Votes in All US Counties", fontsize=16)
plt.show()



In [126]:

    
ue_rate_filt = combined_5[combined_5.ue_rate<=10]
ax = sns.regplot(ue_rate_filt.ue_rate, ue_rate_filt.per_dem)
ax.set(xlabel = 'Unemployment Rate (%)', ylabel = 'County Vote Percent Democrat(%)')
plt.show()
# Unemployment is not a good indicator of voting either way.



In [127]:

    
# Unemployment rate not indicative one way or the other.



In [128]:

    
ax = sns.regplot(combined_5.white_am, combined_5.per_dem)
ax.set(xlabel = 'Percentage White American(%)', ylabel = 'County Vote Percent Democrat(%)')
plt.show()



In [129]:

    
ax = sns.regplot(combined_5.white_am, combined_5.per_gop)
ax.set(xlabel = 'Percentage White American(%)', ylabel = 'County Vote Percent Republican(%)')
plt.show()
# It's scattered, but there is stil a strong correlation between percentage white 
# population and Republican vote.



In [130]:

    
ax = sns.regplot(combined_5.af_am, combined_5.per_dem)
ax.set(xlabel = 'Percentage African American(%)', ylabel = 'County Vote Percent Democrat(%)')
ax.set_title('African American Influence on 2016 Democrtic Vote in All US Counties', fontsize=15)
plt.show()



In [131]:

    
ax = sns.regplot(combined_5.af_am, combined_5.per_gop)
ax.set(xlabel = 'Percentage African American(%)', ylabel = 'County Vote Percent Republican(%)')
ax.set_title('African American Influence on 2016 Republican Vote in All US Counties', fontsize=15)
plt.show()



In [132]:

    
ax = sns.regplot(combined_5.hisp_lat_am, combined_5.per_dem)
ax.set(xlabel = 'Percentage Hispanic/Latino(%)', ylabel = 'County Vote Percent Democrat(%)')
ax.set_title('Hispanic/Latino Influence on 2016 Democratic Vote in All US Counties', fontsize=15)
plt.show()



In [133]:

    
ax = sns.regplot(combined_5.hisp_lat_am, combined_5.per_gop)
ax.set(xlabel = 'Percentage Hispanic/Latino(%)', ylabel = 'County Vote Percent Republican(%)')
ax.set_title('Hispanic/Latino Influence on 2016 Republican Vote in All US Counties', fontsize=15)
plt.show()
# A correlation is there, but it's not that strong due to the sheer amount of 
# counties with little hispanic/latino population.



In [134]:

    
ax = sns.regplot(combined_5.asian_am, combined_5.per_dem)
ax.set(xlabel = 'Percentage Asian American(%)', ylabel = 'County Vote Percent Republican(%)')
ax.set_title('Asian American Influence on 2016 Democratic Vote in All US Counties', fontsize=15)
plt.show()



In [135]:

    
ax = sns.regplot(combined_5.asian_am, combined_5.per_gop)
ax.set(xlabel = 'Percentage Asian American(%)', ylabel = 'County Vote Percent Republican(%)')
ax.set_title('Asian American Influence on 2016 Republican Vote in All US Counties', fontsize=15)
plt.show()



In [ ]:

Swing States



In [136]:

    
ax = sns.regplot(swing_states.div_index, swing_states.election_range)
ax.set(xlabel = 'Diversity Index', ylabel = 'Election Range, Neg=Republican, Pos=Democrat(%)')
ax.set_title("Diversity's Effect on Swing State Votes", fontsize=20, fontname='Ubuntu')
plt.show()



In [137]:

    
ax = sns.regplot(swing_states.div_index, swing_states.per_dem)
ax.set(xlabel = 'Diversity Index', ylabel = 'County Vote Percent Democrat(%)')
ax.set_title("Diversity's Effect on Democratic Vote in Swing States", fontsize=20, fontname='Ubuntu')
plt.show()



In [138]:

    
ax = sns.regplot(swing_states.div_index, swing_states.per_gop)
ax.set(xlabel = 'Diversity Index', ylabel = 'County Vote Percent Republican(%)')
ax.set_title("Diversity's Effect on Republican Vote in Swing States", fontsize=20, fontname='Ubuntu')
plt.show()



In [139]:

    
ax = sns.regplot(swing_states.ue_rate, swing_states.election_range)
ax.set(xlabel = 'Unemployment Rate(%)', ylabel = 'Election Range(%)')
plt.show()



In [140]:

    
# No discernable realtionship for unemployment in the swing states, just as in the overall dataset.



In [141]:

    
ax = sns.regplot(swing_states.white_am, swing_states.per_dem)
ax.set(xlabel = 'Percentage White American(%)', ylabel = 'County Vote Percent Democrat(%)')
ax.set_title("White Americans' Contribtuion to 2016 Swing State Democratic Vote", fontsize=16)
plt.show()



In [142]:

    
ax = sns.regplot(swing_states.white_am, swing_states.per_gop)
ax.set(xlabel = 'Percentage White American(%)', ylabel = 'County Vote Percent Republican(%)')
ax.set_title("White Americans' Contribtuion to 2016 Swing State Republican Vote", fontsize=16)
plt.show()



In [143]:

    
# Look for how incomes of white americans influence how they vote.



In [144]:

    
ax = sns.regplot(swing_states.af_am, swing_states.per_dem)
ax.set(xlabel = 'Percentage African American(%)', ylabel = 'County Vote Percent Democrat(%)')
ax.set_title('African American Influence on 2016 Democratic Vote in Swing State Counties', fontsize=15)
plt.show()



In [145]:

    
ax = sns.regplot(swing_states.af_am, swing_states.per_gop)
ax.set(xlabel = 'Percentage African American(%)', ylabel = 'County Vote Percent Republican(%)')
ax.set_title('African American Influence on 2016 Republican Vote in Swing State Counties', fontsize=15)
plt.show()



In [146]:

    
ax = sns.regplot(swing_states.hisp_lat_am, swing_states.per_dem)
ax.set(xlabel = 'Percentage Hispanic/Latino(%)', ylabel = 'County Vote Percent Democrat(%)')
plt.show()



In [147]:

    
# Again, a scattered, but string correlation.



In [148]:

    
# The change in the uninsured rate does not appear to have benefitted Democrats, 
# but does appear to have benefitted Republicans.

Influence of Education



In [149]:

    
edu.columns









    Out[149]:





Index([                           u'FIPS Code',
                                      u'State',
                                  u'Area name',
                       u'less_hs_diploma_2000',
                       u'hs_diploma_only_2000',
                          u'less_4_years_2000',
                       u'four_or_ higher_2000',
          u'per_less_high_school diploma_2000',
                   u'per_hs_diploma_only_2000',
                      u'per_less_4_years_2000',
                   u'per_four_or_ higher_2000',
           u'less_high_school_diploma_2011_15',
                    u'hs_diploma_only_2011_15',
                       u'less_4_years_2011_15',
                    u'four_or_ higher_2011_15',
       u'per_less_high_school_diploma_2011_15',
                u'per_hs_diploma_only_2011_15',
                   u'per_less_4_years_2011_15',
                 u'per_four_or_higher_2011_15',
                               u'county_state'],
      dtype='object')



In [150]:

    
ax = sns.regplot(combined_5.per_hs_diploma_only_2011_15, combined_5.per_gop)
ax.set(xlabel = 'High School Diploma Only(%)', ylabel = 'County Vote Percent Republican(%)')
ax.set_title("Lower Education's Contribution to 2016 Republican Vote in All US Counties", fontsize=16)
plt.show()



In [151]:

    
ax = sns.regplot(combined_5.per_four_or_higher_2011_15, combined_5.per_gop)
ax.set(xlabel = 'Four or more University Years(%)', ylabel = 'County Vote Percent Republican(%)')
ax.set_title("Higher Education's Contribution to 2016 Republican Vote in All US Counties", fontsize=16)
plt.show()



In [152]:

    
ax = sns.regplot(combined_5.per_hs_diploma_only_2011_15, combined_5.per_dem)
ax.set(xlabel = 'High School Diploma Only(%)', ylabel = 'County Vote Percent Democrat(%)')
ax.set_title("Lower Education's Contribution to 2016 Democratic Vote", fontsize=16)
plt.show()



In [153]:

    
ax = sns.regplot(combined_5.per_four_or_higher_2011_15, combined_5.per_dem)
ax.set(xlabel = 'Four or more University Years(%)', ylabel = 'County Vote Percent Republican(%)')
ax.set_title("Higher Education's Contribution to 2016 Democratic Vote in All US Counties", fontsize=16)
plt.show()



In [154]:

    
ax = sns.regplot(combined_5.per_hs_diploma_only_2011_15, combined_5.election_range)
ax.set(xlabel = 'High School Diploma Only per County(%)', ylabel = 'Election Range (neg=Rep, pos=Dem, %)')
ax.set_title("Lower Education's Contribution to 2016 Vote", fontsize=16)
plt.show()



In [155]:

    
ax = sns.regplot(combined_5.per_four_or_higher_2011_15, combined_5.election_range)
ax.set(xlabel = 'Four or more University Years per County(%)', ylabel = 'Election Range (neg=Rep, pos=Dem, %)')
ax.set_title("Higher Education's Contribution to 2016 Vote in All US Counties", fontsize=16)
plt.show()

Swing States



In [156]:

    
ax = sns.regplot(swing_states.per_hs_diploma_only_2011_15, swing_states.election_range)
ax.set(xlabel = 'High School Diploma Only per County(%)', ylabel = 'Election Range (neg=Rep, pos=Dem, %)')
ax.set_title("Lower Education's Contribution to 2016 Vote in Swing State Counties", fontsize=16)
plt.show()



In [157]:

    
ax = sns.regplot(swing_states.per_four_or_higher_2011_15, swing_states.election_range)
ax.set(xlabel = 'Four or more University Years per County(%)', ylabel = 'Election Range (neg=Rep, pos=Dem, %)')
ax.set_title("Higher Education's Contribution to 2016 Vote in Swing State Counties", fontsize=16)
plt.show()



In [158]:

    
# If a county has a higher percentage of people with only a hs diploma, then more likely
# to vote Republican. If a county has a higher proportion of 4+ college degrees, then 
# more likely to go Democrat. Pretty much aligns with Nat Silver's argument.



In [159]:

    
combined_5.labor_force.head()









    Out[159]:





0     10423.0
1     26186.0
2     15972.0
3    217281.0
4      4266.0
Name: labor_force, dtype: float64

Labor Force



In [160]:

    
ax = sns.regplot(combined_5.labor_force, combined_5.election_range)
ax.set(xlabel = 'Labor Force Body per County', ylabel = 'Election Range(neg=Rep, pos=Dem, %)')
ax.set_title("Labor Force Contribution to Votes in All counties", fontsize=16)
plt.show()

Population



In [161]:

    
combined_5.head(1)









    Out[161]:






  
    
      
      county_state
      state
      county
      est_pop_2015
      pop_change_2015
      int_mig_2015
      dom_mig_2015
      mig_2015
      FIPS Code
      State
      ...
      per_point_diff
      state_abbr
      county_name_r
      election_range
      slight_dem
      slight_gop
      med_dem
      med_gop
      strong_dem
      strong_gop
    
  
  
    
      0
      Abbeville County, SC
      SC
      Abbeville County
      24932
      6
      22
      -12
      10
      45001.0
      SC
      ...
      28.25
      SC
      Abbeville County
      -28.254383
      False
      False
      False
      False
      False
      True
    
  

1 rows × 61 columns



In [162]:

    
ax = sns.regplot(combined_5.est_pop_2015, combined_5.election_range)
ax.set(xlabel = 'Popularion per County (2015)', ylabel = '2016 Election Range(neg=Rep, pos=Dem, %)')
ax.set_title("Population Contribution to Votes in All Counties", fontsize=16)
plt.show()



In [163]:

    
# Population size per county does correlate with vote.



In [164]:

    
ax = sns.regplot(combined_5.pop_change_2015, combined_5.election_range)                                                                                              
ax.set(xlabel = 'Population Change per County(2015)', ylabel = '2016 Election Range(neg=Rep, pos=Dem, %)')
ax.set_title("Population Change Contribution to Votes in All Counties", fontsize=16)
plt.show()



In [166]:

    
# Counties that experienced a positve change in population saw a boost for Dems.



In [167]:

    
# Although there is that cluster towards zero, and the correlation is broad, there
# is still something there.

Modeling

Regression

Most predictive features for counties' vote found through EDA:

(note that these variables, sometimes by their nature, don't necessarily follow a normal distribution)

Percentage White American population

Percentage African American population

Percentage Asian American population

Percentage High School Diploma only

Percentage Four or more years of University



In [169]:

    
modeling = combined_5.drop(combined_5[[0,1,2,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,25,34,35,36,37,38,39,40,52,53]], axis=1)



In [170]:

    
modeling.head()









    Out[170]:






  
    
      
      est_pop_2015
      pop_change_2015
      per_hs_diploma_only_2011_15
      per_four_or_higher_2011_15
      div_index
      af_am
      native_2013
      asian_am
      pac_am
      two_or_more_races
      ...
      per_gop
      diff
      per_point_diff
      election_range
      slight_dem
      slight_gop
      med_dem
      med_gop
      strong_dem
      strong_gop
    
  
  
    
      0
      24932
      6
      37.5
      12.3
      0.445417
      28.2
      0.3
      0.4
      0.0
      1.3
      ...
      62.868333
      3,030
      28.25
      -28.254383
      False
      False
      False
      False
      False
      True
    
    
      1
      62577
      79
      39.2
      10.5
      0.355956
      18.3
      0.3
      0.4
      0.0
      1.3
      ...
      77.262105
      15,521
      56.67
      -56.674943
      False
      False
      False
      False
      False
      False
    
    
      2
      32973
      -25
      39.9
      18.8
      0.539878
      28.0
      0.6
      0.6
      0.2
      1.5
      ...
      54.471596
      1,845
      11.71
      -11.710568
      False
      False
      False
      True
      False
      False
    
    
      3
      434211
      7364
      21.4
      37.1
      0.256622
      1.3
      0.8
      2.6
      0.2
      2.6
      ...
      47.931611
      18,072
      9.24
      -9.239878
      False
      True
      False
      False
      False
      False
    
    
      4
      7228
      -189
      44.7
      15.3
      0.054921
      0.2
      0.1
      0.4
      0.0
      0.7
      ...
      65.336526
      1,329
      35.36
      -35.355148
      False
      False
      False
      False
      False
      True
    
  

5 rows × 29 columns



In [172]:

    
modeling.dropna(inplace=True)
#Only 46 isn't too significant.



In [173]:

    
from sklearn.cross_validation import train_test_split
from sklearn.neighbors import KNeighborsRegressor
from sklearn.linear_model import LinearRegression
from sklearn.cross_validation import cross_val_score
from sklearn.metrics import confusion_matrix, mean_squared_error



In [174]:

    
lr = LinearRegression()



In [175]:

    
modeling.columns









    Out[175]:





Index([               u'est_pop_2015',             u'pop_change_2015',
       u'per_hs_diploma_only_2011_15',  u'per_four_or_higher_2011_15',
                         u'div_index',                       u'af_am',
                       u'native_2013',                    u'asian_am',
                            u'pac_am',           u'two_or_more_races',
                       u'hisp_lat_am',                 u'labor_force',
                          u'employed',                  u'unemployed',
                           u'ue_rate',                   u'votes_dem',
                         u'votes_gop',                 u'total_votes',
                           u'per_dem',                     u'per_gop',
                              u'diff',              u'per_point_diff',
                    u'election_range',                  u'slight_dem',
                        u'slight_gop',                     u'med_dem',
                           u'med_gop',                  u'strong_dem',
                        u'strong_gop'],
      dtype='object')



In [176]:

    
X = modeling[[0,1,2,3,4,5,6,7,8,9,10,11]] 
y = modeling['election_range']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=99)



In [177]:

    
X.head(0)









    Out[177]:






  
    
      
      est_pop_2015
      pop_change_2015
      per_hs_diploma_only_2011_15
      per_four_or_higher_2011_15
      div_index
      af_am
      native_2013
      asian_am
      pac_am
      two_or_more_races
      hisp_lat_am
      labor_force



In [178]:

    
lr.fit(X_train,y_train)
y_pred = lr.predict(X_test)



In [179]:

    
ax = sns.regplot(y_test, y_pred)
ax.set(xlabel = 'Predicted Election Range (neg=Rep, pos=Dem)', ylabel = 'Actual Election Range(neg=Rep, pos=Dem)')
ax.set_title("Predicted vs. Actual Election Ranges for All Counties", fontsize=16)
plt.show()



In [180]:

    
lr.score(X_train, y_train)









    Out[180]:





0.66806575723520978

Model Swing States



In [181]:

    
s_modeling = swing_states.drop(swing_states[[0,1,2,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,25,34,35,36,37,38,39,40,52,53]], axis=1)



In [182]:

    
swing_states.head(0)









    Out[182]:






  
    
      
      county_state
      state
      county
      est_pop_2015
      pop_change_2015
      int_mig_2015
      dom_mig_2015
      mig_2015
      FIPS Code
      State
      ...
      per_point_diff
      state_abbr
      county_name_r
      election_range
      slight_dem
      slight_gop
      med_dem
      med_gop
      strong_dem
      strong_gop
    
  
  
  

0 rows × 61 columns



In [183]:

    
s_modeling.head(0)









    Out[183]:






  
    
      
      est_pop_2015
      pop_change_2015
      per_hs_diploma_only_2011_15
      per_four_or_higher_2011_15
      div_index
      af_am
      native_2013
      asian_am
      pac_am
      two_or_more_races
      ...
      per_gop
      diff
      per_point_diff
      election_range
      slight_dem
      slight_gop
      med_dem
      med_gop
      strong_dem
      strong_gop
    
  
  
  

0 rows × 29 columns



In [184]:

    
X = s_modeling[[0,1,2,3,4,5,6,7,8,9,10,11]] 
y = s_modeling['election_range']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=99)



In [185]:

    
X.head()









    Out[185]:






  
    
      
      est_pop_2015
      pop_change_2015
      per_hs_diploma_only_2011_15
      per_four_or_higher_2011_15
      div_index
      af_am
      native_2013
      asian_am
      pac_am
      two_or_more_races
      hisp_lat_am
      labor_force
    
  
  
    
      4
      7228
      -189
      44.7
      15.3
      0.054921
      0.2
      0.1
      0.4
      0.0
      0.7
      1.5
      4266.0
    
    
      9
      3796
      -75
      39.1
      15.1
      0.058873
      0.3
      0.5
      0.6
      0.0
      0.6
      1.1
      2300.0
    
    
      40
      13886
      -175
      42.1
      16.3
      0.159016
      1.5
      0.6
      0.5
      0.3
      1.0
      5.8
      7727.0
    
    
      75
      12529
      -99
      36.3
      17.6
      0.074125
      0.6
      0.3
      0.3
      0.0
      1.1
      1.6
      6255.0
    
    
      106
      5773
      -20
      42.3
      14.3
      0.049200
      0.4
      0.2
      0.5
      0.0
      0.7
      0.9
      3251.0



In [186]:

    
lr.fit(X_train,y_train)
y_pred = lr.predict(X_test)



In [187]:

    
ax = sns.regplot(y_test, y_pred)
ax.set(xlabel = 'Predicted Election Range (neg=Rep, pos=Dem)', ylabel = 'Actual Election Range(neg=Rep, pos=Dem)')
ax.set_title("Predicted vs. Actual Election Ranges for Swing State Counties", fontsize=16)
plt.show()



In [188]:

    
lr.score(X_train, y_train)
# Right around the same R^2 score as all counties.









    Out[188]:





0.66209473643341399

Classification

Now we want to see what features classify a county into being "slight dem", "slight gop, "med_dem", "med_gop", "strong_dem", and "strong_gop."



In [189]:

    
import numpy as np
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import VotingClassifier, RandomForestClassifier
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report, precision_score, recall_score, roc_curve, auc



In [190]:

    
# Setting the number of neighbors to the square root of number of instances is a good 
# rule of thumb.
knn = KNeighborsClassifier(n_neighbors = 55)
rfc = RandomForestClassifier(max_depth = 5)



In [193]:

    
dummies = pd.get_dummies(modeling[['slight_dem','slight_gop','med_dem','med_gop','strong_dem','strong_gop']])
c_modeling = modeling.join(dummies)
c_modeling = c_modeling.reset_index()
c_modeling = c_modeling.drop(c_modeling[[0]], axis=1)



In [195]:

    
c_modeling.columns









    Out[195]:





Index([               u'est_pop_2015',             u'pop_change_2015',
       u'per_hs_diploma_only_2011_15',  u'per_four_or_higher_2011_15',
                         u'div_index',                       u'af_am',
                       u'native_2013',                    u'asian_am',
                            u'pac_am',           u'two_or_more_races',
                       u'hisp_lat_am',                 u'labor_force',
                          u'employed',                  u'unemployed',
                           u'ue_rate',                   u'votes_dem',
                         u'votes_gop',                 u'total_votes',
                           u'per_dem',                     u'per_gop',
                              u'diff',              u'per_point_diff',
                    u'election_range',                  u'slight_dem',
                        u'slight_gop',                     u'med_dem',
                           u'med_gop',                  u'strong_dem',
                        u'strong_gop',            u'slight_dem_False',
                   u'slight_dem_True',            u'slight_gop_False',
                   u'slight_gop_True',               u'med_dem_False',
                      u'med_dem_True',               u'med_gop_False',
                      u'med_gop_True',            u'strong_dem_False',
                   u'strong_dem_True',            u'strong_gop_False',
                   u'strong_gop_True'],
      dtype='object')



In [196]:

    
# Swing State Classifiers
dummies = pd.get_dummies(s_modeling[['slight_dem','slight_gop','med_dem','med_gop','strong_dem','strong_gop']])
cs_modeling = s_modeling.join(dummies)
cs_modeling = cs_modeling.reset_index()
cs_modeling = cs_modeling.drop(c_modeling[[0]], axis=1)

First test for slight dem and slight gop.



In [197]:

    
# First try KNN for just slight dem and slight gop.
X = c_modeling[[0,1,2,3,4,5,6,7,8,9,10,11]]
y = c_modeling[[29,30,31,32]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)



In [198]:

    
X.head()









    Out[198]:






  
    
      
      est_pop_2015
      pop_change_2015
      per_hs_diploma_only_2011_15
      per_four_or_higher_2011_15
      div_index
      af_am
      native_2013
      asian_am
      pac_am
      two_or_more_races
      hisp_lat_am
      labor_force
    
  
  
    
      0
      24932
      6
      37.5
      12.3
      0.445417
      28.2
      0.3
      0.4
      0.0
      1.3
      1.2
      10423.0
    
    
      1
      62577
      79
      39.2
      10.5
      0.355956
      18.3
      0.3
      0.4
      0.0
      1.3
      2.0
      26186.0
    
    
      2
      32973
      -25
      39.9
      18.8
      0.539878
      28.0
      0.6
      0.6
      0.2
      1.5
      9.0
      15972.0
    
    
      3
      434211
      7364
      21.4
      37.1
      0.256622
      1.3
      0.8
      2.6
      0.2
      2.6
      7.5
      217281.0
    
    
      4
      7228
      -189
      44.7
      15.3
      0.054921
      0.2
      0.1
      0.4
      0.0
      0.7
      1.5
      4266.0



In [199]:

    
y.head()









    Out[199]:






  
    
      
      slight_dem_False
      slight_dem_True
      slight_gop_False
      slight_gop_True
    
  
  
    
      0
      1.0
      0.0
      1.0
      0.0
    
    
      1
      1.0
      0.0
      1.0
      0.0
    
    
      2
      1.0
      0.0
      1.0
      0.0
    
    
      3
      1.0
      0.0
      0.0
      1.0
    
    
      4
      1.0
      0.0
      1.0
      0.0



In [200]:

    
knn.fit(X_train, y_train)









    Out[200]:





KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=55, p=2,
           weights='uniform')



In [201]:

    
y_pred = knn.predict(X_test)



In [202]:

    
print knn.score(X_train,y_train)
print accuracy_score(y_test, y_pred)
print cross_val_score(knn, X_train, y_train, cv=5)
print(classification_report(y_test,y_pred))









    



0.892871526379
0.901771336554
[ 0.90342052  0.90140845  0.8832998   0.875       0.90120968]
             precision    recall  f1-score   support

          0       0.95      1.00      0.98       591
          1       0.00      0.00      0.00        30
          2       0.95      1.00      0.97       590
          3       0.00      0.00      0.00        31

avg / total       0.90      0.95      0.93      1242







    



/Applications/anaconda/lib/python2.7/site-packages/sklearn/metrics/classification.py:1074: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)



In [ ]:

Now test for medium gop and medium dem.



In [203]:

    
#KNN for med_dem and med_gop
X = c_modeling[[0,1,2,3,4,5,6,7,8,9,10,11]]
y = c_modeling[[33,34,35,36]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
y_pred = knn.predict(X_test)



In [204]:

    
print knn.score(X_train,y_train)
print accuracy_score(y_test, y_pred)
print cross_val_score(knn, X_train, y_train, cv=5)
print(classification_report(y_test,y_pred))









    



0.826016915022
0.811594202899
[ 0.82293763  0.81891348  0.81488934  0.83870968  0.83467742]
             precision    recall  f1-score   support

          0       0.95      1.00      0.97       589
          1       0.00      0.00      0.00        32
          2       0.86      1.00      0.93       536
          3       0.00      0.00      0.00        85

avg / total       0.82      0.91      0.86      1242

Now test for strong gop and strong dem.



In [205]:

    
#KNN for strong dem and stronggop
X = c_modeling[[0,1,2,3,4,5,6,7,8,9,10,11]]
y = c_modeling[[37,38,39,40]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
y_pred = knn.predict(X_test)



In [206]:

    
print knn.score(X_train,y_train)
print accuracy_score(y_test, y_pred)
print cross_val_score(knn, X_train, y_train, cv=5)
print(classification_report(y_test,y_pred))









    



0.627064035441
0.631239935588
[ 0.56740443  0.64788732  0.64788732  0.64717742  0.60685484]
             precision    recall  f1-score   support

          0       0.95      1.00      0.98       593
          1       0.00      0.00      0.00        28
          2       0.68      1.00      0.81       420
          3       0.00      0.00      0.00       201

avg / total       0.68      0.82      0.74      1242

Swing States Classifiers



In [207]:

    
#First slight dem and slight gop
X = cs_modeling[[0,1,2,3,4,5,6,7,8,9,10,11]]
y = cs_modeling[[29,30,31,32]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
knn.fit(X_train, y_train)









    Out[207]:





KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=55, p=2,
           weights='uniform')



In [208]:

    
y_pred = knn.predict(X_test)



In [209]:

    
print knn.score(X_train,y_train)
print accuracy_score(y_test, y_pred)
print cross_val_score(knn, X_train, y_train, cv=5)
print(classification_report(y_test,y_pred))









    



0.856332703214
0.87969924812
[ 0.85849057  0.87735849  0.83962264  0.86792453  0.83809524]
             precision    recall  f1-score   support

          0       0.93      1.00      0.96       124
          1       0.00      0.00      0.00         9
          2       0.95      1.00      0.97       126
          3       0.00      0.00      0.00         7

avg / total       0.88      0.94      0.91       266

Medium Dem and GOP



In [210]:

    
#KNN for med_dem and med_gop
X = cs_modeling[[0,1,2,3,4,5,6,7,8,9,10,11]]
y = cs_modeling[[33,34,35,36]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
y_pred = knn.predict(X_test)



In [211]:

    
knn.fit(X_train,y_train)









    Out[211]:





KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=55, p=2,
           weights='uniform')



In [212]:

    
print knn.score(X_train,y_train)
print accuracy_score(y_test, y_pred)
print cross_val_score(knn, X_train, y_train, cv=5)
print(classification_report(y_test,y_pred))









    



0.744801512287
0.781954887218
[ 0.80188679  0.69811321  0.74528302  0.73584906  0.74285714]
             precision    recall  f1-score   support

          0       0.98      1.00      0.99       130
          1       0.00      0.00      0.00         3
          2       0.80      1.00      0.89       107
          3       0.00      0.00      0.00        26

avg / total       0.80      0.89      0.84       266

Strong Dem and GOP



In [213]:

    
X = cs_modeling[[0,1,2,3,4,5,6,7,8,9,10,11]]
y = cs_modeling[[37,38,39,40]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
y_pred = knn.predict(X_test)
knn.fit(X_train,y_train)









    Out[213]:





KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=55, p=2,
           weights='uniform')



In [214]:

    
print knn.score(X_train,y_train)
print accuracy_score(y_test, y_pred)
print cross_val_score(knn, X_train, y_train, cv=5)
print(classification_report(y_test,y_pred))









    



0.576559546314
0.428571428571
[ 0.47169811  0.59433962  0.5754717   0.56603774  0.59047619]
             precision    recall  f1-score   support

          0       0.93      1.00      0.96       124
          1       0.00      0.00      0.00         9
          2       0.50      1.00      0.66        66
          3       0.00      0.00      0.00        67

avg / total       0.56      0.71      0.61       266

Modeling for the "strong" counties of 25-50% is not that predictive.



In [215]:

    
## Random Forests

RFC for slight dem and slight gop



In [216]:

    
X = c_modeling[[0,1,2,3,4,5,6,7,8,9,10,11]]
y = c_modeling[[29,30,31,32]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)



In [217]:

    
rfc.fit(X_train, y_train)









    Out[217]:





RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=5, max_features='auto', max_leaf_nodes=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)



In [218]:

    
y_pred = rfc.predict(X_test)



In [219]:

    
print knn.score(X_train,y_train)
print accuracy_score(y_test, y_pred)
print cross_val_score(knn, X_train, y_train, cv=5)
print(classification_report(y_test,y_pred))









    



0.327829238824
0.900161030596
[ 0.90342052  0.90140845  0.8832998   0.875       0.90120968]
             precision    recall  f1-score   support

          0       0.95      1.00      0.97       591
          1       0.00      0.00      0.00        30
          2       0.95      1.00      0.97       590
          3       0.00      0.00      0.00        31

avg / total       0.90      0.95      0.93      1242

RFC for medium dem and medium gop



In [220]:

    
X = c_modeling[[0,1,2,3,4,5,6,7,8,9,10,11]]
y = c_modeling[[33,34,35,36]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)



In [221]:

    
rfc.fit(X_train, y_train)









    Out[221]:





RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=5, max_features='auto', max_leaf_nodes=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)



In [222]:

    
y_pred = rfc.predict(X_test)



In [223]:

    
print knn.score(X_train,y_train)
print accuracy_score(y_test, y_pred)
print cross_val_score(knn, X_train, y_train, cv=5)
print(classification_report(y_test,y_pred))









    



0.335884011277
0.811594202899
[ 0.82293763  0.81891348  0.81488934  0.83870968  0.83467742]
             precision    recall  f1-score   support

          0       0.95      1.00      0.97       589
          1       0.00      0.00      0.00        32
          2       0.86      1.00      0.93       536
          3       0.00      0.00      0.00        85

avg / total       0.82      0.91      0.86      1242

RFC for strong dem and strong gop



In [224]:

    
X = c_modeling[[0,1,2,3,4,5,6,7,8,9,10,11]]
y = c_modeling[[37,38,39,40]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)



In [225]:

    
rfc.fit(X_train, y_train)









    Out[225]:





RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=5, max_features='auto', max_leaf_nodes=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)



In [226]:

    
y_pred = rfc.predict(X_test)



In [227]:

    
print knn.score(X_train,y_train)
print accuracy_score(y_test, y_pred)
print cross_val_score(knn, X_train, y_train, cv=5)
print(classification_report(y_test,y_pred))









    



0.418445428917
0.626409017713
[ 0.56740443  0.64788732  0.64788732  0.64717742  0.60685484]
             precision    recall  f1-score   support

          0       0.95      1.00      0.98       593
          1       0.00      0.00      0.00        28
          2       0.68      0.99      0.80       420
          3       0.33      0.01      0.02       201

avg / total       0.74      0.81      0.74      1242



In [228]:

    
# Just like in KNN, not the best classifier for "strong counties."

	Location	Diversity-Index	Black or African American alone, percent, 2013	American Indian and Alaska Native alone, percent, 2013	Asian alone, percent, 2013	Native Hawaiian and Other Pacific Islander alone, percent,	Two or More Races, percent, 2013	Hispanic or Latino, percent, 2013	White alone, not Hispanic or Latino, percent, 2013
0	Aleutians West Census Area, AK	0.769346	7.4	13.8	31.1	2.3	4.8	14.6	29.2
1	Queens County, NY	0.742224	20.9	1.3	25.2	0.2	2.7	28.0	26.7
2	Maui County, HI	0.740757	0.8	0.6	28.8	10.6	23.3	10.7	31.5
3	Alameda County, CA	0.740399	12.4	1.2	28.2	1.0	5.2	22.7	33.2
4	Aleutians East Borough, AK	0.738867	7.7	21.8	41.4	0.7	3.7	13.5	12.9

	state	county	est_pop_2015	pop_change_2015	int_mig_2015	dom_mig_2015	mig_2015
0	AL	Alabama	4858979	12568	5726	-2268	3458
1	AL	Autauga County	55347	57	19	-140	-121
2	AL	Baldwin County	203709	3996	221	3469	3690
3	AL	Barbour County	26489	-326	0	-281	-281
4	AL	Bibb County	22583	34	21	4	25

	votes_dem	votes_gop	total_votes	per_dem	per_gop	diff	per_point_diff	state_abbr	county_name	county_state	election_range	slight_dem	slight_gop	med_dem	med_gop	strong_dem	strong_gop
29	5908.0	18110.0	24661.0	23.956855	73.435789	12,202	49.48	AL	Autauga County	Autauga County, AL	-49.478934	False	False	False	False	False	True
30	18409.0	72780.0	94090.0	19.565310	77.351472	54,371	57.79	AL	Baldwin County	Baldwin County, AL	-57.786162	False	False	False	False	False	False
31	4848.0	5431.0	10390.0	46.660250	52.271415	583	5.61	AL	Barbour County	Barbour County, AL	-5.611165	False	True	False	False	False	False
32	1874.0	6733.0	8748.0	21.422039	76.966164	4,859	55.54	AL	Bibb County	Bibb County, AL	-55.544124	False	False	False	False	False	False
33	2150.0	22808.0	25384.0	8.469902	89.851875	20,658	81.38	AL	Blount County	Blount County, AL	-81.381973	False	False	False	False	False	False

	county_state	state	county	est_pop_2015	pop_change_2015	int_mig_2015	dom_mig_2015	mig_2015	FIPS Code	State	...	per_point_diff	state_abbr	county_name_r	election_range	slight_dem	slight_gop	med_dem	med_gop	strong_dem	strong_gop
0	Abbeville County, SC	SC	Abbeville County	24932	6	22	-12	10	45001.0	SC	...	28.25	SC	Abbeville County	-28.254383	False	False	False	False	False	True
1	Acadia Parish, LA	LA	Acadia Parish	62577	79	32	-281	-249	22001.0	LA	...	56.67	LA	Acadia Parish	-56.674943	False	False	False	False	False	False
2	Accomack County, VA	VA	Accomack County	32973	-25	81	-53	28	51001.0	VA	...	11.71	VA	Accomack County	-11.710568	False	False	False	True	False	False
3	Ada County, ID	ID	Ada County	434211	7364	933	3838	4771	16001.0	ID	...	9.24	ID	Ada County	-9.239878	False	True	False	False	False	False
4	Adair County, IA	IA	Adair County	7228	-189	0	-161	-161	19001.0	IA	...	35.36	IA	Adair County	-35.355148	False	False	False	False	False	True

	votes_dem	votes_gop	total_votes	per_dem	per_gop	per_point_diff	election_range
count	3.112000e+03	3112.000000	3.112000e+03	3112.000000	3112.000000	3112.000000	3112.000000
mean	2.006065e+04	19622.378856	4.174537e+04	31.708228	63.613409	39.233014	-31.905181
std	7.199807e+04	40442.737492	1.134048e+05	15.358601	15.651728	20.793041	30.883786
min	4.000000e+00	57.000000	6.400000e+01	3.144654	4.122067	0.040000	-91.636364
25%	1.166000e+03	3206.000000	4.820500e+03	20.475924	54.947846	22.467500	-54.689887
50%	3.153000e+03	7164.500000	1.094700e+04	28.473862	66.743096	40.315000	-38.217390
75%	9.608500e+03	17448.250000	2.879650e+04	39.999326	75.147062	55.462500	-14.876874
max	1.893770e+06	620285.000000	2.652072e+06	92.846592	95.272727	91.640000	88.724525

	est_pop_2015	pop_change_2015	per_hs_diploma_only_2011_15	per_four_or_higher_2011_15	div_index	af_am	native_2013	asian_am	pac_am	two_or_more_races	...	per_gop	diff	per_point_diff	election_range	slight_dem	slight_gop	med_dem	med_gop	strong_dem	strong_gop
0	24932	6	37.5	12.3	0.445417	28.2	0.3	0.4	0.0	1.3	...	62.868333	3,030	28.25	-28.254383	False	False	False	False	False	True
1	62577	79	39.2	10.5	0.355956	18.3	0.3	0.4	0.0	1.3	...	77.262105	15,521	56.67	-56.674943	False	False	False	False	False	False
2	32973	-25	39.9	18.8	0.539878	28.0	0.6	0.6	0.2	1.5	...	54.471596	1,845	11.71	-11.710568	False	False	False	True	False	False
3	434211	7364	21.4	37.1	0.256622	1.3	0.8	2.6	0.2	2.6	...	47.931611	18,072	9.24	-9.239878	False	True	False	False	False	False
4	7228	-189	44.7	15.3	0.054921	0.2	0.1	0.4	0.0	0.7	...	65.336526	1,329	35.36	-35.355148	False	False	False	False	False	True

	slight_dem_False	slight_gop_False	slight_gop_True
0	1.0	1.0	0.0
1	1.0	1.0	0.0
2	1.0	1.0	0.0
3	1.0	0.0	1.0
4	1.0	1.0	0.0

	slight_dem_False	slight_gop_False	slight_gop_True
0	1.0	1.0	0.0
1	1.0	1.0	0.0
2	1.0	1.0	0.0
3	1.0	0.0	1.0
4	1.0	1.0	0.0

	slight_dem_False	slight_gop_False	slight_gop_True
0	1.0	1.0	0.0
1	1.0	1.0	0.0
2	1.0	1.0	0.0
3	1.0	0.0	1.0
4	1.0	1.0	0.0